Geometric Layout Analysis Techniques for Document Image Understanding: a Review
نویسندگان
چکیده
Document Image Understanding (DIU) is an interesting research area with a large variety of challenging applications. Researchers have worked from decades on this topic, as witnessed by the scientific literature. The main purpose of the present report is to describe the current status of DIU with particular attention to two subprocesses: document skew angle estimation and page decomposition. Several algorithms proposed in the literature are synthetically described. They are included in a novel classification scheme. Some methods proposed for the evaluation of page decomposition algorithms are described. Critical discussions are reported about the current status of the field and about the open problems. Some considerations about the logical layout analysis are also reported.
منابع مشابه
Geometric Structure Analysis of Document Images: A Knowledge-Based Approach
ÐGeometric structure analysis is a prerequisite to create electronic documents from logical components extracted from document images. This paper presents a knowledge-based method for sophisticated geometric structure analysis of technical journal pages. The proposed knowledge base encodes geometric characteristics that are not only common in technical journals but also publication-specific in ...
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملDescription Language Approach ( c ) Form Registration Approach ( d ) A Form Document Processing System ( vi ) Major Techniques ( a )
Surveys of the basic concepts and underlying techniques are presented in this chapter. A basic model for document processing is described. In this model, document processing can be divided into two phases: document analysis and document understanding. A document has two structures: geometric (layout) structure and logical structure. Extraction of the geometric structure from a document refers t...
متن کاملAn integrated approach to document decomposition and structural analysis
A document image is a visual representation of a paper document, such as a journal article page, a cover page of facsimile transmission, ooce correspondence, an application form, etc. Document image understanding as a research endeavor consists of developing processes for taking a document through various representations: from scanned image to semantic representation. This paper describes docum...
متن کاملDocument Analysis and Recognition
The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998